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T-^ (54) Title: SYSTCM AND METHOD FOR PLAYBACK OF VIDEO WITH CLOSED CAPITONED TEXT 

(57) Abstract: There is disclosed a video frame grabber capable of capturing a plurality of video frames from a played-back video 
signal during fast forv^ard mode and re\'crse mode. A closed caption text detector is provided that is capable of delecting closed 
caption text in the played-back video signal. A memory is provided for storing the detected closed caption text and a plurality of key 
frames- The key frames comprise selected ones captured by the video frame grabber corresponding to detected closed caption text. 
A video processor retrieves a line of closed caption text and ai least one key frame corresponding to the first line of closed caption 
text and displays the line of closed caption text with the key frame on the display screen. 
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System and method for playback of video with closed captioned text 



TECHNICAL FIELD OF THE INVENTION 

The present invention is directed, in general, to video playback devices and 
more specifically, to a system for displaying closed caption text and related video in fast 
forward and reverse modes. 

5 

BACKGROUND OF THE INVENTION 

A wide variety of video playback devices are available in the marketplace. 
Most people own, or are familiar with, a video cassette recorder (VCR), also referred to as a 
video tape recorder (VTR). More recently, video recorders that use computer magnetic hard 

10 disks rather than magnetic cassette tapes to store video programs have appeared in the 
market. For example, the ReplayTVJ recorder and the TiVOJ recorder digitally record 
television programs on hard disk drives using, for example, MPEG-2 compression. 
Additionally, some video playback devices may record on a readable/writable, digital 
versatile disk (DVD) rather than a magnetic disk. 

1 5 Closed captioning service is widely used on television programs to provide 

textual transcripts for the audio speech segments. Viewers in a noisy environment and 
hearing impaired viewers find this feature especially useful. Text is usually displayed in a 
box in the lower portion of a television screen. After recording a broadcast, a user may wish 
to locate a point or segment within the tape or disk recording to view. The user may wish to 

20 skip a commercial or find a desired scene in the recorded broadcast. The traditional method 
for locating a segment is through "fast forward" and "rewind'' (both considered a fast play 
mode) which accelerates the video information in the forward or backward direction, 
respectively. 

In traditional VCRs as well as digital playback devices, the closed captioning 
25 feature is usually disabled and at most, only a sample of fi-ames may be displayed at a high 
rate during fast forward or rewind mode. This prevents handicapped users fi-om accessing 
the textual information as well as imposing a waste of an important and valuable source of 
content that may enhance browsing quality of the video. 



wo 02/32128 PCT/EP(n/11157 

2 

There is therefore a need in the art for a system for viewing text during fast 
forward and rewind. There is a further need in the art for synchronizing closed caption text 
with video frames during fast play modes. 

5 SUMMARY OF THE INVENTION 

To address the above-discussed deficiencies of the prior art, it is a primary 
object of the present invention to provide, for use in a video recorder utilizing recording tape, 
hard disk or solid state memory, a system and method for display of closed caption text 
during fast play. There is disclosed a video frame grabber capable of capturing a plurality of 
10 video frames from a played-back video signal during fast forward mode and reverse mode. A 
closed caption text detector is provided that is capable of detecting closed caption text in the 
played-back video signal. A memory is provided for storing the detected closed caption text 
and a plurality of key frames. The key frames comprise selected ones captured by the video 
frame grabber corresponding to detected closed caption text A video processor retrieves a 

15 line of closed caption text and at least one key frame corresponding to the first line of closed 
caption text and displays the line of closed caption text with the key frame on the display 
screen. 

Accordmg to an advantageous embodiment of the present invention wherein 
the video processor is capable of displaying a plurality of lines of closed caption text in a 
20 selected window on the display screen. 

According to one embodiment of the present invention, the video processor is capable of 
scrolling the plurality of lines of closed caption text in the selected window on the display 
screen. 

According to another embodunent of the present invention, the video 
25 processor is capable of displaying a first key frame in a first portion of the display screen and 
a second key frame in a second portion of the display screen. 

According to yet another embodiment of the present invention, the video 
processor is capable of displaying the first key frame in the first portion of the display screen 
when the first line of closed caption line text appears in a selected window on the display 
30 screen. 

According to still another embodiment of the present invention, the video 
processor is capable of displaying the first key frame in the second portion of the display 
screen when the first line of closed caption line text scrolls to a new position in the selected 
window on the display screen. 

BNSDOCID: <WO 02321 2BA2_U> 
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According to still yet another embodiment of the present invention, the video 
processor is capable of displaying lines of closed caption text and key frames on the display 
screen at a variable rate determined by commands received from a user of the video play 
back device. 

5 The foregoing has outlined rather broadly the features and technical 

advantages of the present invention so that those skilled in the art may better understand the 
detailed description of the invention that follows. Additional features and advantages of the 
invention will be described hereinafter that form the subject of the claims of the invention. 
Those skilled in the art should appreciate that they may readily use the conception and the 

10 specific embodiment disclosed as a basis for modifying or designing other structures for 

carrying out the same purposes of the present invention. Those sldlled in the art should also 
realize that such equivalent constructions do not depart from the spirit and scope of the 
invention in its broadest form. 

Before undertaking the DETAILED DESCRIPTION, it may be advantageous 

15 to set forth defmitions of certain words and phrases used throughout this patent document: 
the terms "include" and "comprise," as well as derivatives thereof, mean inclusion without 
limitation; the term "or," is inclusive, meaning and/or; the plirases "associated with" and 
"associated therewith," as well as derivatives thereof, may mean to include, be included 
within, intercoimect with, contain, be contained within, coimect to or with, couple to or with, 

20 be communicable with, cooperate with, interleave, juxtapose, be proximate to, be boimd to or 
with, have, have a property of, or the like; and the term "controller" means any device, 
system or part thereof that controls at least one operation, such a device may be implemented 
in hardware, firmware or software, or some combination of at least two of the same. It 
should be noted that the ftmctionality associated with any particular controller may be 

25 centralized or distributed, whether locally or remotely. In particular, a controller may 

comprise one or more data processors, and associated input/output devices and memory, that 
execute one or more application programs and/or an operating system program. Definitions 
for certain words and phrases are provided throughout this patent document, those of 
ordinary skill in the art should understand that in many, if not most instances, such 

30 definitions apply to prior, as well as future uses of such defined words and phrases. 



BRIEF DESCRIPTION OF THE DRAWINGS 



wo 02/32128 PCT/EP01/ni57 

4 

For a more complete understanding of the present invention, and the 
advantages thereof, reference is now made to the following descriptions taken in conjunction 
with the accompanying drawings, wherein like numbers designate like objects, and in which: 

FIGURE 1 illustrates an exemplary video playback device and a television set 
5 according to one embodiment of the present invention; 

FIGURE 2 illustrates in greater detail the exemplary video playback device 
according to one embodiment of tlie present invention; 

FIGURE 3 illustrates a television screen on which closed caption text and 
associated video frames are displayed according to one embodiment of the present invention; 
10 FIGURE 4 illustrates the contents of &e closed caption memory in the 

exemplary video playback device according to one embodiment of the present invention; and 

FIGURE 5 is a flow diagram illustrating the operation of the exemplary video 
playback device according to one embodiment of the present invention. 

1 5 DETAILED DESCRIPTION OF THE INVENTION 

FIGURES 1 through 5, discussed below, and the various embodiments used to 
describe the principles of the present invention in this patent document are by way of 
illustration only and should not be constmed in any way to limit the scope of the invention. 
Those skilled in the art will understand that the principles of the present invention may be 

20 implemented in any suitably arranged video playback device. 

FIGURE 1 illustrates exemplary video playback device 150 and television 
set 105 according to one embodiment of the present invention. Video playback device 150 
receives incoming television signals from an external source, such as a cable television 
service provider (Cable Co.), a local anteima, the Internet, or a DVD or VHS tape player, and 

25 transmits a viewer-selected chaimel to television set 105. In RECORD mode, video playback 
device 150 may demodulate an incoming radio frequency (RF) television signal to produce a 
baseband video signal that is recorded and stored on a storage medium within or connected to 
video playback device 150, In PLAY mode, video playback device 150 reads a stored 
baseband video signal (i.e., program) selected by the user from the storage medivim and 

30 transmits it to television set 105. 

For example, if video playback device 1 50 is a video cassette recorder (VCR), 
also referred to as a video tape recorder (VTR), video playback device 150 stores and 
retrieves the incoming television signals to and from a magnetic cassette tape. If video 
playback device 1 50 is a disk drive-based device, such as a ReplayTVJ recorder or a TiVOJ 
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recorder, video playback device 150 stores and retrieves the incoming television signals to 
and from a computer magnetic hard disk rather than a magnetic cassette tape. In still otlier 
embodiments, video playback device 150 may store and retrieve from a local read/write 
(R/W) digital versatile disk (DVD) or RTW CD-ROM. Thus, the local storage medium may 
5 be fixed (i.e., hard disk drive) or removable (i.e., DVD, CD-ROM). 

Video playback device 150 comprises infrared (IR) sensor 160 that receives 
commands (such as Chamiel Up, Chaimel Down, Volume Up, Volume Down, Record, Play, 
Fast Forward (FF), Reverse, and the like) from a remote control device operated by the 
viewer. Television set 105 is a conventional television comprising screen 1 10, infrared (IR) 

10 sensor 115, and one or more manual controls 120 (indicated by a dotted line). IR sensor 115 
also receives commands (such as volume up, volume down, power ON/OFF) fi-om a remote 
control device operated by the viewer. 

It should be noted that video playback device 150 is not limited to receiving a 
particular type of incoming television signal from a particular type of source. As noted 

1 5 above, the external source may be a cable service provider, a conventional RF broadcast . 
antenna, a satellite dish, an Internet connection, or another local storage device, such as a 
DVD player or a VHS tape player. In some embodiments, video playback device 150 may 
not even be able to record, but may be limited to playing back television signals that are 
retrieved from a removable DVD or CD-ROM. Thus, the incoming signal may be a digital 

20 signal, an analog signal, or Internet protocol (IP) packets. However, for the purposes of 
simplicity and clarity in explaining the principles of the present invention, the descriptions 
that follow shall generally be directed to an embodiment in which video playback device 150 
receives incoming television signals (analog and/or digital) from a cable service provider. 
Nonetheless, those skilled in the art will understand that the principles of the present 

25 invention may readily be adapted for use with wireless broadcast television signals, local 
storage systems, an incoming stream of IP packets containing MPEG data, and the like. 

FIGURE 2 illustrates exemplary video playback device 150 in greater detail 
according to one embodiment of tlie present invention. Video playback device 150 
comprises IR sensor 160, video processor 210, MPEG2 encoder 220, hard disk drive 230, 

30 MPEG2 decoder/NTSC encoder 240, and video recorder (VR) controller 250. Video 

playback device 150 fiirther comprises frame grabber 260, closed captioned detector 270, and 
closed captioned memory (or buffer) 280. VR controller 250 directs the overall operation of 
video playback device 150, including View mode. Record mode, Play mode. Fast Forward 
(FF) mode. Reverse mode, among others. 
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In View mode, VR controller 250 causes the incoming television signal from 
the cable service provider to be demodulated and processed by video processor 210 and 
transmitted to television set 105, without storing or retrieving from hard disk drive 230. 
Video processor 210, which may be, for example, a TriMedia (TM) 1 100 media processor, 
5 contains radio frequency (RF) front-end circuitry for receiving incoming television signals 
from the cable service provider, tuning to a user-selected channel, and converting the selected 
RF signal to a baseband television signal (e,g., super video signal) suitable for display on 
television set 105, Video processor 210 also is capable of receiving a conventional NTSC 
signal from MPEG2 decoder/NTSC encoder 240 and video frames from CC memory 280 and 
10 transmittmg a baseband television signal (e.g., super video signal) to television set 105. 

In Record mode, VR controller 250 causes the incoming television signal to be 
stored on hard disk drive 230. Under the control of VR controller 250, MPEG2 encoder 220 
receives the incoming television signal from the cable service provider and converts the 
received RF signal to MPEG format for storage on hard disk drive 230. In Play mode, VR 
15 controller 250 directs hard disk drive 230 to stream the stored television signal (i.e., program) 
to MPEG2 decoder/NTSC encoder 240, which converts the MPEG2 data from hard disk 
drive 230 to, for example, a super video (S-Video) signal that video processor 210 transmits 
to television set 1 05 . 

It should be noted that the choice of the MPEG2 standard for MPEG2 
20 encoder 220 and MPEG2 decoder/NTSC encoder 240 is by way of illustration only. In 

alternate embodiments of the present invention, the MPEG encoder and decoder may comply 
with one or more of the MPEG-1, MPEG-2, MPEG-4, and MPEG-7 standards. 

For the purposes of this application and the claims that follow, hard disk 
drive 230 is defined to include any mass storage device that is both readable and writable, 
25 including conventional magnetic disk drives and optical disk drives for read/write digital 

versatile disks (DVD-RW), re-writable CD-ROMs, VCR tapes and the like. In fact, hard disk 
drive 230 need not be fixed in the conventional sense that is permanently embedded in video 
playback device 150. Rather, hard disk drive 230 includes any mass storage device that is 
dedicated to video playback device 150 for the purpose of storing recorded video programs. 
30 Thus, hard disk drive 230 may include an attached peripheral drive or removable disk drives 
(whether embedded or attached), such as a juke box device that holds read/write DVDs or re- 
writable CD-ROMs. Furthermore, in an advantageous embodiment of the present invention, 
hard disk drive 230 may include external mass storage devices that video playback 
device 150 may access and control via a network connection (e.g., Internet protocol (IP) 



BNSDOCID: <WO 



02321 28A2.L> 



wo 02/32128 PCT/EPOl/11157 

7 

coimection), including, for example, a disk drive in the user's home personal computer (PC) 
or a disk drive on a server at the user's Intemet service provider (ISP). 

During Play mode, VR controller 250 may receive a Fast Forward (FF) or 
Reverse command from a user via IR sensor 160. In FF or Reverse modes, video playback 
device 150 is capable of displaying closed caption (CC) text in a CC windov^ on television 
screen 110 using frame grabber 260, closed caption (CC) detector 270, and closed caption 
(CC) memory 280. When a FF or reverse command is received, VR controller 250 causes 
hard disk drive 230 and MPEG2 decoder/NTSC encoder 240 to play video at a faster forward 
speed or in reverse, accordingly. However, VR controller 250 also directs video processor to 
stop receiving the output of MPEG2 decoder/NTSC encoder 240 as a source of the video 
signal. Instead, MPEG2 decoder/NTSC encoder 240 is switched to receive the video frames 
from the output of CC memory 280 as the source of the video signal. 

Frame grabber 260 captures and stores video frames from the output of 
MPEG2 decoder/NTSC encoder 240. CC detector 270 detects CC text in the NTSC output 
signal of MPEG2 decoder/NTSC encoder 240. CC text is typically inserted in the blanking 
interval at the end of line 21 of the video signal. As will be explained below in greater detail, 
CC detector 270 uses a time stamp associated with each line of CC data to identify a selected 
key frame of video corresponding to the CC text. CC detector 270 stores each line of CC text 
and the time stamp in CC memory 280 and causes frame grabber 260 to store the selected 
video frame for each line of CC text in CC memory 260. Thereafter, the key frames and the 
CC text are transferred to video processor 210 according to the speed and play back direction 
(FF or Reverse) selected by the user. Video processor 210 plays the key frames as a 
sequence of still frames that appear on television screen 110 contemporaneously with the 
corresponding CC text. 

FIGURE 3 illustrates television screen 1 10 on which closed caption text and 
associated video frames are displayed according to one embodiment of the present invention. 
Television screen 1 1 0 displays video frames 310 and 3 1 5 in pairs, labeled Frame A and 
Frame B respectively, in a top portion of television screen 110. Closed caption text scrolls in 
CC text window 305 in a bottom portion of television screen 110. The display position of the 
frames is arbitrary and the frames and text window may be located at any predetermined 
location on television screen 110. 

The CC text lines scroll in CC text window 305 may scroll at different speeds 
and each line is synchronized v/ith an appropriate key frame according to a synchronization 
scheme (e.g., synchronizing key frames and CC text to match time stamps of CC text and key 
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frames). Depending on whether the mode is forward or reverse, the frames are moved from 
one side of the screen to the other and a new frame is loaded as the CC text scrolls. By 
displaying two key frames at a time, continuity is provided in the video display and there is 
an efficient use of screen space. 
5 In the present embodiment, the user presses the Fast Forward or Reverse (or 

Rewind) buttons several times to indicate fast forward or rewind speed. Time stamp data for 
each line of CC text is used to determine which frame to display for that line of CC text. A 
specific key frame may be picked with a fixed sampling rate or the key frame may be 
selected from a predetermined list of key frames with a "nearest-neighbor" scheme. If the 
1 0 user presses "Pause" during FF or reverse modes, the current video frames in Frame A and 
Frame B are maintained during the pause period. Similarly, closed caption text scrolling in 
CC text window 305 is held frozen when the user presses "Pause." 

FIGURE 4 illustrates the contents of closed caption memory 280 in exemplary 
video playback device 150 according to one embodiment of the present invention. Closed 
15 caption (CC) memory 280 stores N lines of closed caption text, including exemplary closed 
caption lines 401-404, which are labeled CC Line 1, CC line 2, CC Line 3, and CC Line N. 
CC memory 280 also stores N time stamps, including exemplary times stamps 41 1-414, 
which are labeled TSl, TS2, TS3 and TSN respectively. Finally, CC memory 280 stores N 
key frames, including exemplary key frames 421-424, which are labeled Key Frame 1, Key 
20 Frame 2, Key Frame 3 and Key Frame N, respectively. 

TSl and Key Frame 1 are the time stamp and the key frame that correspond to 
CC Line 1 . TS2 and Key Frame 2 are the time stamp and the key frame that correspond to 
CC Line 2. TS3 and Key Frame 3 are the time stamp and the key frame that correspond to 
CC Line 3. Finally, TSN and Key Frame N are tiie time stamp and the key frame that 
25 correspond to CC Line N. 

Referring now to FIGURE 5, a flow diagram 500 illustrates the operation of 
exemplary video playback device 1 50 according to one embodiment of the present invention. 
VR controller 250 is depicted as receiving a fast forward command from a user's playback 
device 150 remote control (process step 505). Frame grabber 260 then begins storing all 
30 video frames from the video media. Additionally, CC detector 270 detects closed caption 
text from the video signal that corresponds to the video frames being stored (process step 
510). 

Key frames are then selected from the stored video frames and stored in CC 
memory 280. In addition to the key frames, CC text hues and time stamps for both text lines 
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and key frames are stored in CC memory 280 (process step 515). Video processor 210 
begins displaying one or more key frames along with corresponding CC text on TV screen 
1 10 at a speed and play back direction determined by user inputs on the user's remote control 
(process step 520). 

5 At least two frames for displaying video are presented on a television or 

display screen. In another predetermined position on the screen, a frame is positioned to 
display speech, from the frames, in the form of readable text. The frames are selected frames 
and the text is speech that corresponds to the frames that are displayed. By selecting key 
frames and text for display, the problem associated with scrolling all the frames in a 

10 recording to find a particular spot is reduced dramatically. 

Although the present invention has been described in detail, those skilled in 
the art should understand that they can make various changes, substitutions and alterations 
herein without departing from the spirit and scope of the invention in its broadest form. 
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CLAIMS: 



1 . For use in a video playback device, an apparatus for displaying closed caption 
text during fast forward mode and rewind mode on a display screen coupled to the video 
playback device, said apparatus comprising: 

a video frame grabber (260) capable of capturing a plurality of video frames 
5 from a played-back video signal during fast forward mode and reverse mode; 

a closed caption text detector (270) capable of detecting closed caption text in 
said played-back video signal; 

a memory (280) for storing said detected closed caption text and a plurality of 
key frames, said key frames comprising selected ones of said plurality of video frames 
10 captured by said video frame grabber (260) corresponding to said detected closed caption 
text; and 

a video processor (210) capable of retrieving from said memory (280) a first 
line of closed caption text and at least one key frame corresponding to said first line of closed 
caption text and displaying said first line of closed caption text and said at least one key 
15 frame on said display screen (105). 

2. The apparatus as set forth in Claim 1 wherein said video processor (210) is 
capable of displaying a plurality of lines of closed caption text in a selected window on said 
display screen (105). 

20 

3. The apparatus as set forth in Claim 2 wherein said video processor (210) is 
capable of scrolling said plurality of lines of closed caption text in said selected window on 
said display screen (105). 

25 4. The apparatus as set forth in Claim 1 wherein said video processor (210) is 

capable of displaying a first key frame in a first portion of said display screen (105) and a 
second key frame in a second portion of said display screen (105). 
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5. The apparatus as set forth in Claim 4 wherein said video processor (210) is 
capable of displaying said first key frame in said first portion of said display screen (105) 
when said first line of closed caption line text appears in a selected window on said display 
screen (105). 

5 

6. The apparatus as set forth in Claim 5 wherein said video processor (210) is 
capable of displaying said first key frame in said second portion of said display screen (105) 
when said first line of closed caption line text scrolls to a new position in said selected 
window on said display screen (105). 

10 

7. The apparatus as set forth in Claim 1 wherein said video processor (210) is 
capable of displaying lines of closed caption text and key frames on said display screen (105) 
at a variable rate determined by commands received from a user of said video play back 
device. 

15 8. A video play back device comprising: 

a storage device capable of storing thereon a plurality of video signals; 
video playback circuitry capable of retrieving a first selected video signal 
stored on said storage device and generating therefrom a played-back video signal capable of 
being displayed on a display screen (105) coupled to said video play back device; 
20 said video playback device further comprising an apparatus as set forth in 

Claim 1. 

9. For use in a video playback device, a method for displaying closed caption 

text during fast forward mode and rewind mode on a display screen (105) coupled to the 
25 video playback device, the method comprising the steps: 

capturing a plurality of video frames from a played-back video signal during 
fast forward mode and reverse mode; 

detecting closed caption text in the played-back video signal; 

storing in a memory (280) the detected closed caption text and a plurality of 
30 key frames, the key frames comprising selected ones of the captured plurality of video frames 
corresponding to the detected closed caption text; and 

retrieving from the memory (280) a first line of closed caption text and at least 
one key frame corresponding to the first line of closed caption text and displaying the first 
line of closed caption text and the at least one key frame on the display screen (1 05). 
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1 0. The method as set forth in Claim 9 wherein said video processor (210) is 
capable of displaying a plurality of lines of closed caption text in a selected window on said 
display screen (105). 

5 

1 1 . The method as set forth in Claim 1 0 further comprising the step of scrolling 
said plurality of lines of closed caption text in said selected window on said display screen 
(105). 

10 12. The method as set forth in Claim 9 further comprising the step of displaying a 

first key frame in a first portion of said display screen (105) and a second key frame in a 
second portion of said display screen (105). 

13. The method as set forth in Claim 12 further comprising the step of displaying 
15 said first key frame in said first portion of said display screen (105) when said first line of 

closed caption line text appears in a selected window on said display screen (105). 

14. The method as set forth in Claim 13 further comprising the step of displaying 
said first key frame in said second portion of said display screen (105) when said first line of 

20 closed caption line text scrolls to a new position in said selected window on said display 
screen (105). 

1 5. The method as set forth in Claim 9 further comprising the step of displaying 
lines of closed caption text and key firames on said display screen (105) at a variable rate 

25 determined by commands received from a user of said video play back device. 
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